First commit for AutoGPT Benchmarks #1

dschonholtz · 2023-04-17T21:35:32Z

The highlights here are as follows:

Run AutoGPT in a dockerfile, give it a standard prompt for all tasks in an eval.
Run it in continuous mode based on a config, so it doesn't ask for input.
Map all of that to an OpenAI completionFn so that when we want to run an OpenAI (and later our own) eval, we can just point it to the eval with: EVALS_THREADS=1 EVALS_THREAD_TIMEOUT=600 oaieval auto_gpt_completion_fn <EVAL_NAME> --registry_path $PWD/auto_gpt_benchmarking

This generates results somewhere... Working on finding those.

dschonholtz · 2023-04-17T22:55:02Z

auto_gpt_benchmarking/AutoGPTAgent.py

+    If the model has used more than 50,000 tokens, it kills the model.
+    If the model has used less than 50,000 tokens, it returns the output.txt file.
+    """
+    def _clean_up_workspace(self):


Need to change this to clear everything out of the dir. This is brittle and other files sometimes get created confusing future agents.

dschonholtz added 3 commits April 17, 2023 17:22

First commit for AutoGPT Benchmarks

89081d9

Cleanup

7212c38

Prompt engineering fixes

59ff485

dschonholtz commented Apr 17, 2023

View reviewed changes

dschonholtz merged commit 22d997d into Significant-Gravitas:master Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First commit for AutoGPT Benchmarks #1

First commit for AutoGPT Benchmarks #1

dschonholtz commented Apr 17, 2023

dschonholtz Apr 17, 2023

First commit for AutoGPT Benchmarks #1

First commit for AutoGPT Benchmarks #1

Conversation

dschonholtz commented Apr 17, 2023

dschonholtz Apr 17, 2023

Choose a reason for hiding this comment